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ABSTRACT 


This thesis examines spectrometric oil analysis data 
from two sources in an attempt to formulate a statistical 
model which will be useful in monitoring aircraft engines 
in the Naval Oil Analysis Program. Initially, experimental 
data, gathered for an Air Force study, is used to determine 
if the measurement error inherent in the monitoring 
procedure is normally distributed and if correlations exist 
between measurements for different wear metals. Based on 
the results of this investigation, a study is made of 
operational data from Wright reciprocating engines of the 
R1820-82 model type. This investigation leads to the 
conclusion that a multivariate regression model is useful 
in estimating the parameters of the distribution of analyses 
from properly operating engines of this type. A procedure 
is then suggested which would employ the readings from past 
oil analyses from a particular engine to determine its 


present condition. 
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LL. @MTROBUCTION 


For several years the technique of using the spectro- 
metric analysis of oil samples as an aid in determining the 
condition of diesel engines has been employed successfully 
by major railroads and various other users of large diesel 
equipment. In 1956, a trial program was begun at the Naval 
Air Rework Facility in Pensacola to determine if this 
method could also be used to monitor aircraft engines. 
Since that time the program has proved successful and has 
evolved into the Naval Oil Analysis Program (NOAP). It is 
planned that this program will eventually include all Navy 
fluid lubricated mechanical systems. A more detailed 
history of the program is contained in Refs. 1 and 2. 

Since this thesis is concerned with an investigation of 
data collected at the Pensacola laboratory, the following 
descriptions of the operation will be limited to the pro- 
cedures used there. Reciprocating aircraft engines are 
sampled approximately every 30 hours. The sample is taken 
after the aircraft has returned from a flight and before 
the oil has become cold. It is immediately sent to the 
laboratory by air mail and is analyzed on the day received. 
The analysis is accomplished by a spectrometer using the 
rotating graphite electrode technique. Measurements of the 
parts per million (ppm) content of ten metallic elements, 


which might be indicative of engine wear, are made 


Simultaneously. Of these ten, aluminum, chromium, iron, 
Silver, copper, magnesium and nickel are those which are 
relevent to engines of the model considered in this report. 
The ppm readings are automatically recorded on a punched 
card which also contains various other hand-entered data 
identifying the sample. 

Once the data has been recorded, it is used to aid in 
determining what the operating condition of the engine 
might be. Presumably, if the engine is operating properly, 
the amount of metallic contamination in the circulating jem 
should be within certain normal limits. In addition, it is 
felt that the amount of contamination added to the oil 
Since it was last sampled should be within specific limits 
if the engine is in good working condition. If, however, 
the engine is discrepant and excessive wear is present, 
this will presumably cause an abnormal addition of metallic 
contaminates to the circulating oil. 

Thus, when a sample has been analyzed and the results 
recorded, both the magnitudes of the present readings and 
the changes in the readings since the last sampling are 
compared with threshold limits which have been developed 
for each engine type and each metallic element relevent to 
that engine. If the results fall outside the prescribed 

o 
limits, some action is generally taken by the laboratory. 
Usually another sample is requested and the previous 


results are verified. If the abnormality persists, either 


the aircraft is grounded for maintenance or future samples 
are taken more frequently than the usual 30 hour interval. 

At present, these threshold limits are subjectively set 
and vary only from element to element and among engine 
model types. They are based on the past history of the 
aircraft model which includes the data supplied by the 
engine manufacturer before the model - placed in service 
and experience accumulated once the model is in use. The 
limits are not used as sharp boundaries for classifying 
engines as normal or discrepant but merely as indicators 
upon which a subjective decision as to the action to be 
taken can be based. 

This report examines two sets of data from the Pensa- 
cola laboratory with the intention of determining the 
propriety of three assumptions implicit in this classifi- 
cation procedure. Since the same threshold limits are 
used for all engines of a particular model, it is assumed 
that all normally operating engines of the same type can be 
expected to have the same amounts of metallic contamination 
in their oil systems. In addition, since threshold limits 
are constant for a given element and model type, variations 
in other factors, such as the operating hours since the 
last oil change, must be ignored or subjectively introduced 
into the classification procedure. Finally, since thresh- 
Old limits are set for each element independent of the 
limits for other elements, readings for different elements 


are assumed to be uncorrelated. 


Once these assumptions are verified or rejected, a 
statistical model is formulated to aid in establishing a 


more objective classification criterion. 


Liter bRRORS. [INHERENT CN THE MONITORING PROCEDURE 


Since the intention of NOAP is to make inferences 
about the condition of aircraft engines, based on the 
amount of wear metal contamination in the engine's oil 
system, it is extremely important that the amount of con- 
tamination recorded at the laboratory accurately reflect 
the actual amount present in the engine. For the purposes 
of this report, measurement error will be defined as the 
difference between the parts per million content of an 
element recorded as present in a particular engine at a 
point in time and the actual content at that time. In NOAP 
there are a variety of potential sources of error, all of 
which can contribute to the net measurement error defined 


above. 


A. ERRORS IN SAMPLING 

As was mentioned earlier, oil samples are taken from 
reciprocating engines normally every 30 flying hours and 
while the oil is still hot. This sampling is accomplished 
with a special sampling kit consisting of a sampling tube 
and a sampling bottle. The tube is inserted into the oil 
reservoir, and when it has filled the top end is stopped 
with the operator's finger. The contents are then trans- 
ferred to the bottle, which is immediately forwarded to 
the laboratory for analysis. When the sample is analyzed, 


a small portion of the oil in the bottle is used in the 


analysis [Refs. 1 and 2]. Thus, an extremely small amount 
of oil is used to determine the extent of contamination in 
the engine's entire oil system. Any lack of homogeneity 

in the engine's oil reservoir will result in a non- 
representative sample. Further, any contamination added 

to the sample through a lack of cleanliness of the sampling 
tube and bottle or through handling at the laboratory will 


Gemeributce to the measurement error. 


B. ERRORS IN RECORDING 

At the time the sample is taken, certain data including 
the date, the operating hours since the last oil change, 
the hours since the last overhaul of the engine, the engine 
serial number and the model number are recorded and mailed 
to the laboratory with the sample. Various portions of 
this data are transferred from other records. At the 
laboratory the data are entered by hand on the permanent 
record cards maintained there [Ref. 1]. This entire 
seguence of recording and transferring data from one record 
to another can result in mistakes. 

Unfortunately, as with the errors in sampling, there is 
no data available at the present time that can be used to 


measure this error. 


C eSB RRORS IN ANALYSDS 
When an oil sample is received at the Pensacola labora- 
tory, it is analyzed using a direct reading spectrometer 


with spark excitation, stationary and rotating disc 
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electrodes. The sample bottle cap is filled with oil and 
placed in the spark stand. The gap between the two 
electrodes is set and the disc electrode begins to rotate 
at 30 rpm. As the electrode rotates, a thin film of oil 
is forced to the area under the fixed electrode. A high 
energy spark is then fired across the gap and the film of 
oil is burned for 25 seconds. The light from the burning 
oil is separated so that its intensity at the wave lengths, 
Produced by the elements to be analyzed, can beseempanued 
with built-in standards. The average intensity over the 
burning period is then measured for each element simul- 
taneously and converted into parts per million. These 
readings are automatically recorded on the engine history 
card [Ref. 2]. 

If it is assumed that there were no errors in the 
sampling or recording and thus, that the oil used in the 
analysis 1s representative of the oil in the engine, any 
difference between the true content of contamination in 
the engine and that recorded after the analysis can be 
attributed to an analysis error. An experiment designed 
to measure the effects of this type of error has been 
conducted and the results of an analysis of this data are 


presented in the next section. 


Jed 


ee ee Goo lON OF Jn Ai yo er 


Although the error due to the spectrometric analysis 
of the oil is not the only possible source of error, it 
certainly 1S a major contributor to the over-all measure- 
ment error defined earlier. For this reason, an examination 
of data accumulated for a study conducted by the Air Force 
[Ref. 3] was performed and the results are discussed in 


thas section. 


A. DATA 

In 1967, the Pensacola laboratory participated in an 
experiment conducted by the Air Force. During a 30 day 
period the laboratory received 100 oil samples. These 
were to be analyzed in the normal manner and the results 
reported. Although the laboratory was not aware of it, 
these 100 samples consisted of ten samples each repeated 
ten times. Thus, the laboratory actually repeated the 
analysis of ten different samples ten times. The results 
of the analyses, for the seven elements of interest in 
this report, are included in the Computer Output section, 
where the readings on like samples are in groups numbered 
from one to ten. Missing data accounts for some groups 
having less than ten readings. 

For each of the ten groups of repetitious analyses, the 
sample mean and standard deviation were calculated for each 


element. If, for example, x. is the jee reading for 


a2 


aluminum in a group of size n, then the sample mean, X, 


and standard deviation, S, for aluminum are 


n 
X= ( = X.)/n 
1=1 
and 
a ee 2 ze 
S Sel 2 KS eee) / inal) | (1) 
i=l 
respectively. The results of these calculations are also 


presented in the Computer Output section for each element 
and each group of samples. 

In addition, for each sample group, an estimated corre- 
Paeron Matrix, Ry was Gallculated, whene af ee is the 


element in the jth row and 52 column Of  aiven 


n 
BX  — Xi) Kerrey %52 
: k=1 
a ) 
et X.)2 oF X.)2) 
d x. - X yy Xe - xX 
ker SP ye. IK 


where I is the xen reading for the mists element, and x. 
is the element's sample mean. These correlation matrices 
are included in the Computer Output section. 

These preliminary computations provided statistics 
which were used to test certain hypotheses concerning the 


probability distribution of the analysis error. 


ilies: 


B. TEST FOR NORMALITY 

Since the overall measurement error-is the net effect 
of errors arising from a variety of sources, the Central 
Limit Theorem of Probability Theory [Ref. 4] provides good 
justification for making an assumption of normality in the 
distribution of this error. Thus, any additional evidence, 
which tends to Penitetts the normality of one of the con- 
tributing sources of error, will serve to strengthen the 
overall assumption. The experiment conducted by the Air 
Force provided data which was used to test for normality 
in the distribution of the analysis error. 

Tete X, is defined as a seven-component vector of sample 


readings arising from sample "“groutp™k = 1,2,.../10-,, then 


where Uy, 1S a seven-component vector of the true metallic 


content of seven elements in the sample associated with the 


h 


paca) group of readings, and ey is the seven-component random 


analysis error vector. Thus, if it is assumed that ey is 


a multivariate normal random variable with zero mean vector 


and unknown covariance matrix 2,, then Xy 


normal random variable with mean vector Uy, and covariance 


ia ts eb eal chemh siice 


matrix 2, denoted N(uy, 2y). 


If this assumption is correct, it is possible to perform 


a transformation of the form, 


i ae eg 


2, = Py (x 


ane a by), 
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which will produce a multivariate normal random variable 2, 
which has mean vector zero and covariance matrix I, the 
identity matrix. For details of this transformation see 
Appendix A. 

For each of the ten groups of sample readings the mean 


vector ly was estimated using 


m 


k 
x= Ra sae (3) 
where Z€ 1s the j= vector of sample readings from the sy 
sample group, and ny is the number of readings in that 
group. In addition, the covariance matrices, tee were 
estimated using the unbiased estimator 
By = (Sy 85 555), (4) 


where S; is the estimated standard deviation for the poh 


element, computed as in equation (1), and rij is as defined 
by equation (2). For each of the ten groups the non- 


Singular matrix, Pie was found and the transformation, 


oN wi 


performed on each vector of readings, X in the cee 


| Sapa 


"rou, k =- 1,2,... AiO. Im this way vectors Zyos 
7 


produced, the components of which are stochastically inde- 


wore 


pendent and are distributed according to N(0,1) if the 


hypothesis is true. 


LS 


All readings from the ten groups — pooled to 
produce a sample of 651 deviates, assumed to be univariate 
normal. The Kolmogorov-Smirnov goodness of fit test was 
applied to this sample and the resulting test statistic of 
.0215 was not significant even at the .20-level. Thus, the 
hypothesis of normality in the distribution of the analysis 


error was accepted. For details of the test see Appendix A. 


C. TEST FOR EQUALITY OF COVARIANCE MATRICES 


Let Xx 


‘ again be defined as 


ie gee 


as in the previous section, where now ey 1s assumed to be 
a multivariate normal random vector. In addition, let the 


estimators X. and i be defined by equations (3) and (4) 


k 


respectively. Then, the Air Force data can be used to 


nw 


peouucewten estimated covasrance matrices 2 wy each agso= 
Clated with a different sample group and thus, a different 
true content vector Ly « If the true covariance matrices, 
Lae are independent of the vector Hy and thus constant 
tOrecdm Kk = L,2,..-,L0, the ten estimatea matrices could 
Dewpeeleaeee Obtaim am over-all estamategof i. ghis 
hypothesis of the equality of the ten covariance matrices 
was tested, and the results led to the rejection of the 
hypothesis at the .10 level of significance. The details 


of the test used and the results obtained are included in 


Appendix A. 
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In an attempt to account for the apparent variability 
among cOvariance matrices from different sample groups, a 
regression model was formulated. It had been suggested by 
Baird-Atomic Inc., the manufacturer of the spectrometer 
used at the Pensacola laboratory, that the variability in 
readings for a given element is dependent on the true 
content of the element. Specifically, the relationship 


7S asSsumea to be 


go? = a + Due 


where of is the variance of repeated analyses of the same 
sample for a given element, wu is the true parts per million 
content of the element, and a and b are constants [Ref. 5]. 


The model, 


S =at bx? +e, 


was used to examine the propriety of this relationship for 
each of the seven elements under consideration. Here oe 

is the square of the standard deviation estimate defined by 
eguation (1), x? is the square of the sample mean, and e is 
a random variable. For each element, a and b were estimated 
using least-squares technigues and, under the assumption 
that variations about the regression line are normally 
distributed, the hypothesis, b = 0, was tested [Refs. 6 

and 7). A t-test was used and the significance level set 

at .10. Of the seven slopes tested, those associated with 


aluminum, iron, copper and magnesium were significantly 


Ly 


non-zero. It should be mentioned that the Bcue, Content of 
the other elements did not vary much among the ten sample 
groups. The results of the least-squares estimation, 
together with the numerical results of the t-tests, are 


included in the Computer Output section. 


[itor POR CNDEPENDENCE, AMONG EREVIENT.S 

Because of the apparent dependence of the variance of 
repeated readings upon the true content of an element, it 
was felt that covariances between elements might depend 
upon the content of the elements concerned. If this were 
the case, then, for example, a particularly high reading 
of one element might be "explained" by a corresponding 
low reading of another element. Under the assumption of 
normality, the hypothesis of independence is equivalent to 
the hypothesis of zero correlation. This hypothesis was 
Pores Incqacach.oF sche ten correlation matricesskR, 
defined by equation (2). Of the ten tests conducted, only 
three of them were not significant at the .10 level. The 
numerical results and the test used are given in Appendix A. 
It can be noted from the correlation matrices in the 
Computer Output section that particularly strong corre- 
lations seem to exist between iron and copper, silver and 
copper, and magnesium and iron. 

Thus, it appears that, in general, the readings of 
different elements are not independent, and some explana- 


tion, for example, of an erroneous copper reading may come 


18 


from an examination of the corresponding iron reading on 


the same sample. 
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IV. ANALYSIS OF OPERATIONAL DATA 


Using the evidence provided by the Air Force data to 
Support the assumption of normality in the distribution 
of the measurement error, an investigation of some opera- 
tional data was conducted. A statistical model, which makes 
use of the apparent correlations between the readings on 
different elements, was formulated. The details of the 


Mmeacet! o Peni bokmonmaremcduccussed jnehasmSaGhaOn] 


A. NOAP DATA 

The Naval Air Rework Facility at Pensacola provided a 
magnetic data tape containing the records of operational 
analyses performed there from the beginning of July to the 
end of September in 1967. The records of some 21,000 
different analyses were included on the tape. For each 
analysis the engine model number and serial number, as well 
as the date the analysis was performed and its results in 
parts per million for each relevant element, are recorded. 
In addition, it includes the operating hours since the last 
Oil change and since the last overhaul of the engine for 
each sample. Unfortunately, the action recommended by the 
laboratory after each sample was analyzed and the results 
of that action were not available with the tape. For this 
reason, there was no way of determining with certainty 
which analyses were on oil from properly operating engines 


and which were from discrepant engines. 


20 


Of the 113 different engine models represented on the 
tape, the Wright reciprocating engine model, R1820-82, was 
selected for investigation since it was the most frequently 


sampled, with 4,134 different analyses. 


B. PRELIMINARY RESULTS 

Since it seemed logical to expect the content of 
metallic contamination to show some increase from normal 
wear in a properly operating engine as the hours since the 
last oil change increase, some preliminary plots were made 
by the computer. Six hundred different analyses were used 
with no regard to the particular engine of the R1820-82 
type from which they came. For each of seven elements, 
relevant to the monitoring of engines of this type, the 
computer plotted the ppm content versus the operating hours 
Since the last oil change. For three of the elements, 
iron, copper and aluminum, there was some indication that a 
buildup of contamination occurs. The other four plots gave 
no evidence of any significant trend. For comparison, the 
plot of chromium is included with those of aluminum, iron, 
and copper in Figures 1 to 4, respectively. Since these 
plots were made on sample readings from a variety of en- 
gines, they did not indicate whether a particular engine 
can be expected to show the same trend. For this reason, 
the five most frequently sampled engines of the R1820-82 
type were selected and the computer was again used to plot 


the data from these engines. Five different symbols, 
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x, +, A, 9, andOl, were used on each plot to represent 

the five different engines. Thus, both the behavior of the 
readings for a given engine and differences among the 
readings of the engines could be seen. eee evidence 

of trends was limited to the three elements iron, copper 
and aluminum. For these elements each of the five engines 
showed a roughly linear increase in the ppm content as the 
hours since the last oil change increased. For comparison, 
the plot for chromium is again included with the plots of 


aluminum, iron and copper in Figures 5 through 8. 


C. "REGRESSION eM@DEL 

Because of the evidence of a linear increase in ppm 
content versus an increase in hours since oil change, 
provided by the computer plots, a regression model was 
suggested. With this type of model the expected content of 
a metallic element would change as the hours Since the last 
O1l change varies. Thus, differences between what is a 
normal amount of contamination for a properly operating 
engine just after its oil has been changed and several 
flying hours later could automatically be incorporated 
into a classification criteria. 

l. Data Selection 

Of all the engines of the R1820-82 model type 

represented on the tape, those with eight or more different 
analyses were selected. Of these, any with missing data on 


One Or more records, which brought the usable number of 
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readings to less than eight, was rejected. The remaining 
engines were then screened in an attempt to insure that all 
the data had come from properly operating engines. As was 
mentioned earlier, the tape did not provide data on either 
the recommendation made by the laboratory or the results 
of any maintenance which might have been recommended. For 
this reason, it seemed that the best way to insure that none 
of the data had come from a discrepant engine was to exclude 
all readings which exceeded the threshold limits presently 
used. These limits are given in Ref. l. After this pro- 
cedure had been applied to the data, any engine with less 
than eight readings was eliminated from further examination. 
At this point there were 27 engines of the R1820-82 type 
remaining, each of which was represented with from eight 
to fourteen readings. This was the data that was used in 
the remainder of the investigation. 
2. Estimation and Tests of Regression Coefficients 
It was assumed that the outcome of each spectro- 


metric analysis on a given engine is of the form 


Y = BX +e 


where Y 1S ws» S€menecompenene veclLorgeiapem metaliie 
@emtents; B is a 74% matmix of unknown coefficlents7™ xs 

a two-component vector with first component identically 

one and second component equal to the operating hours since 
thiembast oil change; and e is’ a multivariate: né’malwemaer 


vector with mean vector zero and unknown covariance matrix, 


Sal 


oe llabe recalled that the examination of “thesA im 
Force data in section III indicated that the covariance 
matrix may vary as the mean content vector changes. How- 
ever, the slopes of the regression rman mentioned in part 
C of that section are such that variations in the mean 
vectors of the extent present in the operational data do 
not result in an appreciable variation in the elements of 
~. For this reason, it will be assumed that the covariance 
Nath. 2, Ls constant in the development to follow. Uimgien 
Ehise assumption, the Matrix B associated wath each endgame 
was estimated using the least-squares estimation technique 
described in Appendix B. 

Since the random error vector is assumed to be 
N(O, Pee ewe Se eon eon a parti culsi.ong) pews 


Dee). ) tees Miata oe S ieee t 1 One SOmeiiag= 


B = (By, Bo), 


then a test of the value of each of the components of Bo 
can be made to determine whether the variability of the 
readings for a specific element is related to variations 

in the hours since oil change. The details of this test 
are included in Appendix B. For each engine and each 
component of Bo a test of the hypothesis that the component 
1s equal to zero was made. Table 1 gives the number of 
times a specific component was significantly positive or 


negative at an over-all a-level of .10. For this a value, 


the expected number of times in 27 tests the results will 


a2 


be significantly positive or negative, if the component is 


actual len "zeu@.,," 15 wl. 3 on 


TABLE lL 


NUMBER OF SIGNIFICANT REGRESSYON SBOPEHS 
TWO-TAILED TESTGW a = 20 


Number of toby YEN S ONE 
B Element significantly significantly 
=A positive negative 
components components 
aa Aluminum 8 0 
Bo? iron 17 ; 
ee Chromium 3 1 
Bo 4 Silver 1 3 
ae Copper ing 1 
AG Magnesium 5 0 
me Nickel 5 0 


These results indicate that the metallic content of 
aluminum, iron and copper in properly operating engines of 
the R1820-82 type tends to increase as the hours since the 
engine's oil was last changed increase. The evidence 
Pemmting to this conclusion is particularly strong in Ee 
case of iron and copper. Further, there seems to be no 
Significant indication that such a relationship exists 
in general for chromium, silver, magnesium or nickel. The 


numerical results of the tests of the components of B. for 
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each engine, in addition to the raw data and estimates of 
Band 2, are included in the Compuces O@tpur Se@rron. 

Any use of these results in establishing an 
operational classification procedure for identifying dis- 
crepant engines depends upon the estimation of the unknown 
item, 5, LrOM past alvdryses. Gre tits reason, htc 
important to determine if the observations from different 
engines all come from the same over-all probability 
distribution. If this 1s the case, all data on engines of 
the R1820-82 model type could be used to estimate a single 


eerie be 2S ad LlYst step In this direction, the model 


Y. = B.X + e. 
—1 —1— —1 


was used where Y. is a seven-component vector of readings 


on engine 1; B. = (B Bo)as a 7x2 matrix of coefficients 


1! 
associated with the a engine and where the components of 
B. associated with chromium, silver, magnesium and nickel 
are assumed to be zero; and e. is N(O, Ls). The elements 
of By not assumed to be zero, were estimated as before 
and used to estimate the 27 covariance matrices, Lie The 


unbiased estimate of ze aS 





gl 


-4.4 7 B. Xa)" 
1 rJ 3 


rj 


where Y; is the je® observation of the vector Yai B. 1s 


Ff 
the estimate of B.i Se “S elev= pot OD S aie Valteneae Of X; and 


n; is the number of observations associated with the pee 
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engine. A test of the hypothesis of equal covariance 
matrices was made using these estimates. The test statistic 
was extremely significant at the .10 level, and the hypo- 
thesis of equal covariance matrices was rejected. The test 
used and numerical results are included in Appendix A. 

Since the evidence indicates that the covariance 
Matrices associated with readings from different engines 
are not the same, the overall conjecture of like distri- 


butions must also be rejected. 


S5 


Vee COMCi ET Sho 


The investigation of the Air Force data and the actual 
analysis records of a three month period from the Pensacola 
laboratory lead to three main conclusions. First, the error 
inherent in the ppm spectrometric readings is multivariate 
normally distributed with significant covariances existing 
between the readings of various pairs of elements. Further, 
there appears to be a linear increase in the content of 
aluminum, copper and iron present in properly operating 
engines of the R1820-82 type as the hours since the last 
Oil change increase. Finally, there seems to be no justi- 
fication for expecting readings on samples from different 
engines of the R1820-82 type to vary in the same manner. 
Based on these results, an objective classification 
criterion can be formulated which may be of use in imporv- 
ing the present classification procedure. 

For example, all back data on a particular engine of 
the R1820-82 type which was in proper working order could 
Demucech tC estimate the matrrxeByewitch ein turn could be 
DEe@@eemeoemmare the Covariance Mathix 2. Then, any 
ebservation, Y, Of the spectrometric analysis of a new jam 
Sap les rome Enatsenginewrs diSemrouted as Nite )) 1E tbe 
engine is operating properly. The estimates of B and 2 
could be used to construct a confidence region RY, (x) 
[Appendix B]. The region R(x) would be constructed so 


that, if the engine is operating properly, the reading a 
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will be contained in the region R(x) with probability l-a. 
Thus, one classification criterion would be: classify the 
Gwgine as Operating pu@perly tt Yac within R, (x) and 
classify as discrepant otherwise. By making a small, say 
.O1, the number of operational engines, which are mistakenly 
classified as discrepant, can be expected to be of the order 
of 1 in 100. However, the smaller the parameter a is made, 
the larger the region R, (x) becomes, and thus, the more 
likely it is that a discrepant engine will be classified 
as operating properly. 

For this reason, it may be more appropriate to use 
two values of a. For example, a could be set at .10 and q' 
at .01 and two regions R(x) and Riva (x) cons truesed. Sin 
this way a procedure could be used which would 1) classify 
meosengine aS in proper working order if Y is ain R(x) 
Paeclacscity asediscrepantz if Y iS mer in Ri (x); and 3) 
Pomeiise wyerification @f Y or mere frequent sampling 1£°¥ 
is jan Roa Cx) bUtEenO Gn RY (x) - 

The final selection of a specific classification 
criterion and the setting of the appropriate level(s) of 
a must be done subjectively and should be based upon an 
examination of the costs involved. If the cost of classi- 
fying a discrepant engine as operational is much larger 
than the cost of grounding an operational aircraft then a 
should be made appropriately large compared to its value 


iemEne revemse were tLrie,. 


oy 


APPENDIX A 
STATISTICAL @BSTS 


A. TRANSFORMATION OF N(u, 2) TO N(O, I) 
ee ro cl multivariate normal random variable with 


Tleaieve@eOn, ti, and COVaetanceamatenas 7 schen 
Z= P(X - wy) 


is multivariate normally distributed with mean Vector jam 
and cOVarlance Matrix, PXP* [Ref 8]. in addition, Steam 
is the symmetric matrix of a positive definite quadratic 


form, there exists an Orthogonal matrix, B, such that 


BYB' = D 


Vee rer les academia! Matrix witiasadl lediccdoemelselemeniac 
positive {Rer. 9J. The matrix, B, Can be constructed Ula 
the characteristic vectors of XY as columns of B. Further, 
Ef the matrix C 18 defined as the diagonal Matrix Wheel 
diagonal elements equal to the inverse of the square-root 


Qeeclemeonrcoponamig clement OL Uy aeacm 


Sey eve' = enc = I 


ioe elo selc tGemeLcty Matrix. lito 1 eye sGe mec 
P = CB 


—— 


then 
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Z = P(Y - u) 


1s multivariate normal with zero mean vector and identity 
Se variance matrix, and the elements Of the Vector 7 are 
mutually stochastically independent standard normal random 


Varlables. 


B. KOLMOGOROV-SMIRNOV TEST OF GOODNESS OF FIT 

Let F(x) be defined as the cumulative distribution 
function of the random variablewX which is N(O,)). In 
addition, define S,, (x) to be the sample cumulative distri- 
bution function based on a set of n observations of a random 


variable assumed to be N(0,1). Then 
S aise = k/n 


where k is the number of observations in the sample which 
are less than or equal to x. Then, the Kolmogorov-Smirnov 


test statistic D is defined as 





D = max | F (x) Saas) 


Observed values of D can be compared with its tabled distri- 
bution to determine the acceptability of the normal hypo- 


mescis [Ref. 10]. 


om TEST FOR EQUALITY OF SEVERAL COVARIANCE MATRICES 
Suppose Yi ue are p-component multivariate 
normal random variables with distribution denoted N(ui 2s), 


1 = 1,2,...,q. In order to test the hypothesis 
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i Ls nig @ ger I el 1: 609) = lo eee ele 


based on gq samples of size N; PEommene distrisucion of Yar 


i— life... ,d, Let the followtng@gquanetrics bevdecuncces 


ns = N; =, 1 =Slae. 2c 
g 
n= 2 nj 
i=l 
ar 
1 
ae ¥ ‘ i,k Wks oe p we Gl 
k=1 
and 
q 
oe A. - 
a 


Then the test statistic 1s 


g _ Zz 
KE Zn, (log/Zi)— log}as |) 


i a — 
i=l 
where 
.o-pey st . le eee 
ga 7G mn 6(p+l) (q-1l) 
i= A/n 
and 
ze = A./n; ae) On Ofc 


Asymptotic expansion of the distribution of this test 
statistic results in a distribution described by the 


following probability statement for W, 
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Pr (Wsw) = Pr (x<<w) = c [Pr (x2, ,<w) = Pr (x¢<w) ] = eee 
where 
al 1 ee 2 
Pipi) hte -1) (pe 2 gee —z 7 75) 76 (q-1) (Sie) 0) 
i=in, n 
ee pas 
48k? 
£F = 43(q-1) (ptl)p 


and X¢ denotes a chi-square random variable with f degrees 
of freedom [Ref. 8]. 

This test was applied to the Air Force data as described 
in III C, with the resulting values of W = 478.5, £ = 252 
and c = 5.75. The test statistic is extremely significant 
at the .01 level and the hypothesis was rejected. 

In addition, the test was used on the operational data 


memeesCcribed in IV C 2 where now 


14> 


= A/(n-q) 


and 


NN 


| = A,/(n,-1) Joo LE rey | 


instead of as defined above. The results for this test 
were W = 936.2, £ = 728 and c = 14.9. Once again the test 
Statistic is extremely significant at the .01 level and the 


hypothesis was rejected. 


Al 


* 


D. TEST FOR INDEPENDENCE AMONG A SET OF NORMAL VARIATES 
If the p-component vector Y has a multivariate normal 
dastribution described by N(u, 34), then the test of zene 


covariance, 


Fer all i # 3 and i,3°2°1,2,...,p is equivalent to a tege 
Seen SteGenastlc andependence Of Eheseemponenes Clue 

Suppose a sample, Se a of n observations from the 
adistrabutionmmer Y 1s obtained, then letey .equalethegdaeas 
minant of the correlation matrix, R, defined by equation (2). 
The asymptotic expansion of the distribution of V results 


in the probability statement [Ref. 8], 


Pr (-mlogVsv) = Pr(yé<v) + -Sipr(x2,,<v) - Pr(yfsv)] + 0(m >) 
mM 


where 
m=n- B.D 2s 
See ea 1), 
and 
2 
aoe deo Doe =p. ~ Ae) 
288 ; 


This test was applied to the data for the Pensacola lab- 
Oratory accumulated in the Air Force experiment as des- 


cribed in section III D. Table 2 gives the numerical 
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results for the ten groups of samples tested and the test 


outcomes for a level of significance of .10. 


TABLE 2 


RESULTS OF Thol [on Si PhE EV DENCE 


fm Dil m= Sees 
(2 ae Oar Ps a = .10 

Sample Group Test Statistic Result 
dl 26.20 Accept 
2 eee ell 0. Reject 
5 305 om Reject 
4 wes) 7; Reject 
3 338 30m) Reject 
6 36.36 Reject 
Ji 2S Accept 
8 41.60 Reject 
9 2 3 Ore Accept 
AEG Sis) full Reject 
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APPENDIX B 
REGRESSION: ESTIMATION, THSTS AND PREDICTION 


A. ESTIMATION OF REGRESSION PARAMETERS 

Let Y be a sewen-component xvandom vector with distmae 
bution N(BX, 2), where B is an unknown 7x2 matrix of 
coefficients, X is of the form (1,x)*', and 2 is an unknown 
Sevariance matrix which 1s¥constant for ali values ofa 
tich bom aSedeestimate Of .B8 [|Reft. 8] based Jonea 


sample of size n with the Fen hen N(BX., Ls 


8 = ca 
where 
ie ! 
C= = YX 
i=l 
and 
1a) 
Pe= © X.X. . 
Le 


This estimate is normally distributed with mean matrix, B, 
and covariance matrix yy 8 ak where the symbol, @, denotes 
the Kronecker product [Ref. 11]. Further, the unbiased 


estimate of Plkeci. mc) 1S 


jr> 


Hm Dd 


ss (Y. - BX.) (Y. a BX.) 
frie i=) 


and (n-2)2 has ota teaiS tai ouLLOn With parameters  ) 


and n-2. 
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B. TEST OF THE VALUE OF "REGRESS PN S@OLrriGirrwts 


Suppose the 7x2 matrix B 1S partitioned in the form 


B= (By, B,) ° 


teen fOr any non-null vector of censtanecy Cy the mypormesas 


C'B, = C'B* 


can be tested using an F statistic [Ref. 12]. By choosing 
Smee be the vectorgwith as the pel component and all other 
components zero, and the vector 15)" to be the null vector, 


the hypotheses 
Bay = 0 1 a= 1, Zoe, / 


can be tested. With this selection, the standard t-test 
of the slope of a regression line [Ref. 6 and 7] can be 
used. In this way, each of the components of Bo can be 
tested individually, each with an assigned level of 


Significance, a. 


C3 CONSTRUEGTION OF THE EEGCION, R, (x) 
If the matrices B and iy are estimated with a sample of 
meet in the manner described above, and if Y is a wew 


observation from N(Bx* 


, UL) the ceontirdenes, reguen Ry Ox) can 
Pesconstructed so that Y will be rn Ry Ox) with probability 


femlLcast l-a. Thesestimate of the mean of Y is 


> 


* 


| > 


a 
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and has covariance matrix, 


a ie 
S=T(A@L ) I! 
where 
n 
Se XX. 
ae 
and 
- Ff 
T= xX 6 I. 
Thus, 


(¥ - Bx*)' s-'(¥ - Bx*) 


has Hotelling's t? distribution [Refs. 8 and 5]. Hence, the 


S@tien Vectemsem ssatiusfying 


gi 


(P-m' st -m < T4(a) 


comprise a 100 (l-a)% confidence region for BXx* [Refs. 8 and 
5]. Since the region, R(x), is to place bounds on Y, which 


Reemecovarlance Matrix, 2, 1 Canebe setiumediay 
“A : A mlLia 2 
(Y-m)'(S + 2) (¥ - m) < T*(a). 


TiCmseCemorsal |) veecors, m, Satisfying this Gons traimaieeor 
a confidence region, R(x), whichew2 |i ecomrct th yaw el 


PEO eywatsleast 1 - @ if Y came from N(BxX 40%) . 
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SAMPLE CCRRELATICK MATRICES 
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